MOOC visual analytics: Empowering students, teachers, researchers, and platform developers of massively open online courses
نویسندگان
چکیده
Along with significant opportunities, MOOCs provide major challenges to students (keeping track of course materials and effectively interacting with teachers and fellow students), teachers (managing thousands of students and supporting their learning progress), researchers (understanding how students interact with materials and each other), and MOOC platform developers (supporting effective course design and delivery in a scalable way). This paper demonstrates the use of data analysis and visualization as a means to empower students, teachers, researchers, and platform developers by making large volumes of data easy to understand. First, we introduce the insight needs of different stakeholder groups. Second, we compare the wide variety of data provided by major MOOC platforms. Third, we present a novel framework that distinguishes visualizations by the type of questions they answer. We then review the state of the art MOOC visual analytics using a tabulation of stakeholder needs versus visual analytics workflow types. Finally, we present new data analysis and visualization workflows for statistical, geospatial, and topical insights. The workflows have been optimized and validated in the Information Visualization MOOC (IVMOOC) annually taught at Indiana University since 2013. All workflows, sample data, and visualizations are provided at http://cns.iu.edu/2016-MOOCVis.html. INTRODUCTION Sites like Class Central (https://www.class-central.com) and MOOC List (https://www.mooc-list.com) help students find relevant courses across platforms. On March 22, 2015, more than 2000 courses by more than 50 different providers were listed. The top-five Learning Management System (LMS) providers from Class Central are • Coursera, a social entrepreneurship company founded by computer science professors Andrew Ng and Daphne Koller from Stanford University, offering 989 courses. • EdX, a not-for-profit enterprise with MIT and Harvard universities as founding partners, offering 473 courses. • Canvas.net, an online course network developed and supported by Instructure, an education technology company that partners with educators, institutions, and technologists, offering 242 courses. • Miríada X, (129 courses) a platform widely used to teach Spanish MOOCs, offering 129 courses. • Udacity, a for-profit educational organization founded by Sebastian Thrun, David Stavens, and Mike Sokolsky, offering 74 courses. • Google Course Builder (GCB), the open source education platform by Google, is covered as ‘Independent’ in Class Central (but missing in MOOC list) and used to teach 104 courses (https://code.google.com/p/course-builder/wiki/ListOfCourses). As the number and quality of MOOC courses increases, the number of MOOC students increases as well. In September 2015, Coursera reported that it alone has 15M students registered for its offering of 1,000 courses in 35 languages by its 120 partner institutions; 2.5M students completed courses (Coursera, 2015). David Malan’s CS50x, an introductory computer science course offered by Harvard and edX, attracted 150,000 student enrollments in its 2013 offering (Malan, 2013). However, course completion rates are less than 10%, indicating a need to explore novel means, including visual analytics, to help people manage course materials, understand requirements, and understand their very own learning progress to ultimately increase completion rates. Major reasons for incompletion are inability to commit time, poor prior knowledge, lecture fatigue, poor course design, clunky communication tools, and bad peer review (Colman, 2013). Students that pay a fee—even a minimal fee such as $50 for a Signature Track program— reach completion rates of 70% (Kolowich, 2013). The remainder of this paper discusses MOOC data analyses and visualizations that aim to help MOOC students, teachers, researchers, and platform developers understand and improve learning dynamics, trajectories, and progress at the individual and aggregated levels. The subsequent section reviews the insight needs and tasks of these four user groups. The Methods section discusses data types and formats that different MOOC platforms support, and it presents data analysis and visualization workflows that address the needs of different user groups. The MOOC Visual Analytics Workflows section showcases rerunnable workflows and discusses key insights gained from MOOC data analyses. The paper concludes with an outlook to future challenges and opportunities. RELATED WORK There is value in the richness of real-world classroom interactions. When aiming to teach or take a MOOC, students and teachers can quickly feel like they are driving blindfolded in heavy traffic on a German autobahn that has no speed limit—they simply have no means to tell who is driving next to them, how fast they are travelling, and when to expect a major collision. In response, some teachers decide to use MOOCs as a platform for high bandwidth delivery of lecture videos and low bandwidth means for multiple-choice assignments that are graded automatically. However, a growing number of users—students and teachers but also learning researchers and platform developers—are embracing MOOCs as a means to “teach the world,” i.e., to improve learning outcomes for millions of students. The challenges encountered are numerous. Some are quantitative in number: scaling up to 100,000 students per class is non-trivial. Because most online course delivery platforms are not designed for high-volume traffic, few make it easy to effectively participate in or manage 1000 discussion threads actively used by 100,000 students. Other challenges are qualitative in nature: teaching students from 100+ countries with vastly different expertise and cultural backgrounds requires language and time zone support but also sensitivity to cultural expectations and foreknowledge (Karen, 2015). While some challenges are shared by students, teachers, researchers, and platform developers, others are specific to one stakeholder group. Key insight needs by the different stakeholders are discussed here in non-exhaustive lists. Exemplary visualizations are shown in Table 1, which tabulates the four different user groups (columns A-D) versus five types of analysis and visualization discussed in the section MOOC Visual Analytics (rows 1-5). References to original works are given in the lower right of each table cell. Workflows for visualizations with a red triangle in the upperright hand corner of the cells are discussed in the section MOOC Visual Analytics Workflows and available at http://cns.iu.edu/2016-MOOCVis.html. Table 2 provides captions with context for each cell of Table 1. Table 1. Analysis types vs. User Needs. Full-size version at http://cns.iu.edu/2016-MOOCVis.html. Cell Caption Citation A1 Scores vs. time invested watching course videos for students who took the 2013 (blue) and 2014 (orange) IVMOOC midterm (left) and final exam (right) and got at least 50% correct. Filled circles indicate students that earned a badge in the IVMOOC while unfilled circles indicate other students that took the exam. The orange and blue lines indicate the trend lines for the respective years. See Figure 1 A2 A bar graph from Canvas, showing students’ daily use of online course materials. The height of each bar encodes the number of page views (Instructure, 2014) on a given day. Selecting a bar additionally details the date and number of participants on that date. A3 Proportional symbol map of the world showing the location of IVMOOC students from 2013 (blue) and 2014 (orange). Circles are area size coded by the number of students per country. Not all students reported their country and missing values are given in the lower part of the map. The top-five countries per year are listed in the lower left. See Figure 2 A4 A bar graph illustrating the number of views for each video of the IVMOOC. Both the 2013 (blue) and 2014 (orange) offerings of the class are shown. Each video is categorized as either “Theory” or “Hands-On.” See Figure 4 A5 A network graph of student collaboration during the final assignment of the IVMOOC. The project entailed working on real-world client projects. The nodes of the graph represent the students who completed the projects in groups, had designated roles, and communicated with each other via Twitter. (Börner & Polley, 2014) B1 A histogram of the number of students who spent a given amount of time in MIT’s 2012 offering of 6.002x. Time is displayed on a horizontal log axis. The bars of the histogram are grouped and colored by how much overall progress students made on the course, measured by homework completion, taking the midterm, and earning a certificate. (Seaton, Bergner, et al., 2014) B2 Activity over time for each student with a unique login in the 2013 IVMOOC. Tracked student actions include registering for the course (purple square), taking an examination (blue triangle), watching a YouTube video (green square), and using the course’s hashtag on Twitter (orange diamond). Students are sorted vertically by registration date. (Börner & Polley, 2014) B3 The percentage of students earning a certificate on a country-bycountry basis for Stanford’s Coursera offering of Cryptography I in 2013. Darker coloring indicates a higher percentage of certificate earners. (Dernoncourt et al., 2013) B4 The percentage of students earning certificates (%N) accessing more than a given percent (%R) of each of the resources in MIT’s 2012 offering of 6.002x. The line graph plots usage curves for which the density of users equals the opposite of the curve’s derivative. The blue histogram illustrates lecture video access, and the red histogram illustrates lecture question access. (Seaton, Bergner, et al., 2014) B5 A directed network of student movement while working on the homework (a), the midterm (b), and the final (c) to other course components in MIT’s 2012 offering of 6.002x. The thickness of the edges encodes number of student movements, and the size of the nodes encodes time spent on a course component. (Seaton, Bergner, et al., 2014) C1 A stacked bar graph showing student scores per question on the midterm of the Information Visualization MOOC’s 2014 offering. A See Figure 3 total of 142 students took the 31-question test, receiving either full credit, partial credit, or no credit for each question. C2 Line graphs of student activity, filtered to include certificate earners only, in MIT’s 2012 offering of 6.002x. Each point represents the number of times a given resource was accessed divided by the number of people active on the day of access. Students took the midterm and final exams in the blocks of time enclosed by the grey, striped rectangles. (Seaton, Bergner, et al., 2014) C3 Relative resource usage by country for MIT’s 2012 offering of 6.002x (left) and Stanford’s 2013 Coursera offering of Cryptography I (right). The country of each student is inferred from the student’s log-in IP address. (Dernoncourt et al., 2013) C4 The average number of distinct contributors for a given thread length on the discussion forums of three successive Coursera offerings of Machine Learning (ML) and Probabilistic Graphical Models (PGM). In general, each new comment on a thread is by a new contributor, reflective of the question-and-answer behavior of the forums. (Anderson et al., 2014) C5 An enrollment network of HarvardX courses (blue nodes) and MITx courses (red nodes). Directed edges between nodes indicate that a student who completed the course of the source node subsequently enrolled in the course of the destination node. The edges are filtered only to include those with over sixty subsequent enrollments. Node size encodes a sum of inand out-degree. (Ho et al., 2015) D1 The probability that users of Stack Overflow, an online question-andanswer site, take one of three actions. After completing the A1 action on the website 25 times, users earn a badge. A2 represents all other actions on the website. A3 is the “life-action” of offline activity. As users near earning a badge, they increase usage of the website as a whole and take more of the badge-encouraged action. (Anderson et al., 2013) D2 Student activity measured by the number of observed events per day for MIT’s 2012 offering of 6.002x. This graph is among a set of interactive visualizations developed as part of MoocViz, an open access analytics platform. (Dernoncourt et al., 2013) D3 The Abilene nationwide advanced network supports the Internet2 by providing an effective interconnect among the regional networking aggregation points, or gigaPoPs, pioneered by Internet2 universities. The GlobalNOC Real Time Atlas shows live traffic for Abilene with high line utilization in red. (“GlobalNOC,” 2007) D4 The number of actions per day taken by users of Stack Overflow, an online question-and-answer site, relative to the day that they earned the “Electorate” badge. Among four possible actions—questions (Q), answers (A), question votes (Q-votes), and answer votes (A-votes)— we see that users increase their Q-voting activity as they near the Electorate badge, which is awarded for Q-votes. (Anderson et al., 2013) D5 Sankey graph by Google Analytics showing the flow of traffic on http://cns.iu.edu from July 25 to August 24, 2014. In 895 recorded sessions, most visitors came from the United States (Country / Territory). From the 314 users who visited the home page (Starting pages), visitors most likely went to the current team page (1st Interaction). Red flows indicate drop-offs—visitors who idle or leave the site. (Ginda, 2014) Table 2. Captions for Table 1. Supporting information for each of the cells in Table 1, a full-size version of which can be found at http://cns.iu.edu/2016-MOOCVis.html. Students taking MOOCs need to be extremely organized and disciplined. While some MOOCs provide class "meet ups" of various kinds, most MOOCs have no hand-holding or encouragement via weekly inclass teacher encounters (Kizilcec & Halawa, 2015). Students have different use behavior and learning needs in MOOC environments based on their demographics and learning styles (Guo & Reinecke, 2014; Liegle & Janicki, 2006). As a result, there has been a recognized need for research in personalized learning environments, or PLE’s, a term which encapsulates MOOCs, and for research in the challenges students face to use MOOC platforms effectively (McLoughlin, 2013). Visual analytics tools can help students keep track of • Key learning goals and the most efficient study strategies, e.g., how to best benefit from lecture videos, e-textbooks, notes, forums, and the internet. • How they are performing (e.g., are major milestones reached and good grades accumulated) and how their progress compares to other students (e.g., leading or lagging on exams, see cell A1 of Table 1, enlarged in Figure 1). • Who else is taking the course and who might be a good study partner or teammate, e.g., based on expertise, performance, or time zone (see geographic distribution of students in cell A3 of Table 1, enlarged in Figure 2, and for collaboration patterns see cell A5 of Table 1). Figure 1. Exam Score vs Time Watched. Scores vs. time invested watching course videos for students who took the 2013 (blue) and 2014 (orange) IVMOOC midterm (top) and final exam (bottom) and got at least 50% correct. Filled circles indicate students that earned a badge in the IVMOOC while unfilled circles indicate other students that took the exam. The orange and blue lines indicate the trend lines for the respective years. (Full-size version at http://cns.iu.edu/2016-MOOCVis.html) Figure 2. Location of IVMOOC Students. Proportional symbol map of the world showing the location of IVMOOC students from 2013 (blue) and 2014 (orange). Circles are area size coded by the number of students per country. Not all students reported their country and missing values are given in the lower part of the map. The top-five countries per year are listed in the lower left. (Full-size version at http://cns.iu.edu/2016-MOOCVis.html) Teachers (a term which also includes course staff and others helping with teaching a course) of MOOCs need effective means such as visual analytics to keep track of and guide the activities, progress, and problems encountered by thousands of students (Mazza & Dimitrova, 2004). They need to understand the effectiveness of materials, exercises, and exams with respect to learning goals in order to continuously improve course schedules, activities, and grading rubrics, see also course monitoring goals discussed in (Stephens-Martinez, Hearst, & Fox, 2014). Note that this insight is important while the course is running but also to evaluate past courses and help prepare future courses. Important metrics for teachers in a MOOC context are: • Students’ demographics (e.g., number, background, level of expertise, age, gender, language); motivational factors such as degree or career goals or intended usage of the newly acquired knowledge; and learning styles (e.g., individual vs. team or textual vs. visual learner) (Kizilcec & Halawa, 2015). • Student activity and learning progress indicators that make it possible to provide extra support for students that fall behind or to offer more advanced materials to students that master materials quickly (Martinez-Maldonado, Clayphan, Yacef, & Kay, 2014; Taylor, Veeramachaneni, & O’Reilly, 2014; Whitehill, Williams, Lopez, Coleman, & Reich, 2015). • Bursts of activity, such as those caused by problems with learning materials or inappropriate student behavior, which teachers must counteract and resolve quickly. Activity bursts may also be caused by external events creating unique “teachable moments” that contextualize a particular topic or idea. • Student performance across exercises, exams, and projects including the analysis of who did what in a team project or how active a student was in the online discussions (Instructure, 2014), see cell B2 of Table 1 and Figure 3. “Open Learner Models” can be used to analyze group collaboration and design interfaces that enhance student learning (Clayphan, Martinez-Maldonado, & Kay, 2013; Guerra, Hosseini, Somyurek, & Brusilovsky, 2016). • Student feedback collected via online surveys to reveal strengths and weaknesses of course materials or teaching methods and to reveal additional topics students would like the course to have covered. Figure 3. Exam Scores by Question. Student scores per question for midterm (left) and final exam (right) for IVMOOC 2014. (Full-size version at http://cns.iu.edu/2016-MOOCVis.html) Researchers who study human learning and are keen to understand what teaching and learning methods work well in a MOOC environment now have massive amounts of detailed data with which to work. As all student interactions—with learning materials, teachers, and other students—are recorded in a MOOC, human learning can be studied at a level of detail never before possible. Many MOOC teachers double as learning researchers as they are interested to make their own MOOC course work for different types of students. Visual analytics tools can help researchers study • Whether factors such as gender, age, education level, disciplinary background, country of origin, and language influence study strategies and learning outcomes. Related work shows that students have fundamental differences in how they interact with course material (Anderson, Huttenlocher, Kleinberg, & Leskovec, 2014; Ho et al., 2015), how they navigate through MOOCs (Guo & Reinecke, 2014; Seaton, Nesterko, et al., 2014), and ultimately how they perform (Kizilcec & Halawa, 2015). • The popularity and temporal dynamics of student access to content. For example, Seaton, Bergner, et al. (2014) found spikes in textbook usage before examinations (see cell C2 of Table 1), indicating that students use these resources as references. • The effectiveness of different student activities and tests or media in teaching; see also cells B4 and B5 of Table 1 (Seaton, Bergner, et al., 2014). • Whether students’ study strategies—such as the amount of time spent interacting with course content—correlate with grades, see also Figure 1. • The importance of motivation and personal organization for completing a MOOC course. Platform developers need to design systems that support effective course design, efficient teaching, and secure but scalable course delivery. They need to support times of high traffic and resource consumption and schedule maintenances during low activity times. Visual analytics tools can help developers monitor • Aggregated user activity patterns to optimize system setup, to detect broken links, or to identify irregularities such as hackers or bots, see cells D1 to D5 of Table 1 and associated references. • Usage of course materials to improve widely-used functionality and omit irrelevant features, see cell C3 of Table 1 (Dernoncourt et al., 2013). METHODS This section first discusses the different data types and formats that major MOOC platforms support and then compares existing and novel data analysis and visualization workflows that address the needs of different user groups. MOOC DATA Before selecting one of the more than 50 existing MOOC platforms, setting up the course, and opening registration for the first students in a class, it is important to identify what data is needed to monitor student activity relevant for reaching clearly defined learning objectives. Different insight needs (see previous section) can only be satisfied if specific types of data can be recorded by the MOOC platform, obtained via custom surveys, or accessed via existing databases and services (e.g., university student records or LinkedIn data). For example, in order to perform learning outcomes assessments, one must know student knowledge and skills before and after taking the course; to examine gender differences for essay exams vs. multiple-choice exams, one must know the gender of each student. Each MOOC LMS platform supports the collection of a wide variety of data. Canvas supports a “Course Stream” that lists recent announcements, conversation messages, assignment notifications, and discussions, and it provides “Course Analytics” that show activity such as page views and student actions over time, assignments submitted on time or late or missing, and grades as a box-and-whisker plot per assignment/exam. GCB provides guidance on how to collect data on three basic categories: "assessments", which cover data from homework and tests tracked within GCB; "reach and engagement", which cover both the location of students enrolled in the course and the activity of each student acquired via Google Analytics; and "happiness", which are student responses to surveys about how satisfied they are with specific aspects of the course (https://code.google.com/p/course-builder/wiki/MeasureEfficacy). Here, we distinguish four general types of data: demographic, performance, activity, and feedback data. Each type is explained subsequently. Demographic Data: General student demographics, including age, gender, language, education level, and location. Demographic data is commonly acquired during the registration process, and additional demographic data can be acquired via feedback surveys that are discussed ahead. Performance Data: Student performance based on graded assessments. This is generally collected from homework, quizzes, and examinations, but it also includes results from pre-course surveys designed to examine student knowledge before they take the course. Activity Data: How students are using class resources, such as the time and date of watching videos, reading material, turning in homework, taking quizzes, or using the discussion forum. Most platforms break down usage by content and media type (i.e., page views, assignment views, textbook views, video views). Following students’ paths through the content via inbound and outbound links (see cell D5 in Table 1) is important for understanding learning trajectories. Feedback Data: Student input and feedback. Feedback data allows course providers to learn more about student learning goals and motivation, their intended use of course, and the content students hope to learn. Feedback data also contains information about what students liked or disliked in terms of course content, structure, grading, or teacher interaction. MOOC platforms differ widely in how they allow access to data. The ideal platform depends on what student, teacher, learning researcher, or platform developer needs are to be supported as different data analyses and visualizations require rather different data inputs. However, all four data types can be acquired in general. For example, although Canvas does not directly provide the demographic data of gender, age, location, and level of education, it can be acquired using feedback surveys. Regardless of the adopted platform, API access is usually superior to manual data export or dashboard data access for real-time visual analytics. Access to Google Analytics data is typically restricted to those who run a web site and likely not visible to students. Some data, such as performance data for all students, is only accessible by teachers, and privacy concerns require anonymization or aggregation before it can be shared with other users. MOOC VISUAL ANALYTICS A visualization which aims to answer all possible questions for different stakeholders is likely too complex to be understood by many users (Börner, Maltese, Balliet, & Heimlich, 2015). Instead, most visualizations aim to answer “When?”, “Where?”, “What?”, and “With Whom?” questions using temporal, geospatial, topical, and network approaches, respectively. Börner’s (2015) visualization framework is used here as a guide to review existing work and to identify appropriate workflows given a very large combinatorial space of different MOOC datasets and hundreds of different data analysis and visualization algorithms. Specifically, Table 1 provides a visual overview of exemplary visualizations, whose captions are in Table 2, that is organized by the four different stakeholder groups introduced in the section Related Work (columns A-D) versus five types of analysis and visualization (rows 1-5). Subsequently, we review existing work on MOOC visual analytics as well as workflows used to visualize data from the Information Visualization MOOC (http://ivmooc.cns.iu.edu) taught each spring at Indiana University and detailed in the section MOOC Visual Analytics Workflows. Statistics Analysis and Visualization Line graphs, correlation graphs, and box-and-whisker plots are all examples of how statistical data can be rendered visually. Shown in Table 1, cell A1 is a graph that shows the return on time investment for students in terms of class score and badges achieved. For details, see Figure 1 and explanatory text. B1 plots the frequency of hours spent on a course by each student. Color indicates the percentage of attempted assessments (none in gray, >5% in red, ... >25%—and >25% on the midterm—in blue, certificate earners in purple) (Seaton, Bergner, et al., 2014). C1 depicts statistical data relevant for learning researchers that shows what exam questions potentially need revisions. For details, see Figure 3 and explanatory text. D1 features data relevant for platform developers that shows how the probability of certain student activities changes after certain actions or exams are completed (Anderson, Huttenlocher, Kleinberg, & Leskovec,
منابع مشابه
Generating Lifelong-Learning Communities and Branding with Massive Open Online Courses
The arrival of Massive Open Online Courses (MOOCs) has stimulated teachers and universities to change in some ways the teaching methodologies. The success of these massive courses is based on involving students to acquire knowledge and skills in a wider community by learning from others and using active learning practices. MOOC providers also help universities to support the mission of transfer...
متن کاملWhat Massive Open Online Course (MOOC) Stakeholders Can Learn From Learning Analytics?
Massive Open Online Courses (MOOCs) are the road that led to a revolution and a new era of learning environments. Educational institutions have come under pressure to adopt new models that assure openness in their education distribution. Nonetheless, there is still altercation about the pedagogical approach and the absolute information delivery to the students. On the other side with the use of...
متن کاملEvaluation of a learning analytics application for open edX platform
Massive open online courses (MOOCs) have recently emerged as a revolution in education. Due to the huge amount of users, it is difficult for teachers to provide personalized instruction. Learning analytics computer applications have emerged as a solution. At present, MOOC platforms provide low support for learning analytics visualizations, and a challenge is to provide useful and effective visu...
متن کاملLearning Analytics in Massive Open Online Courses
Educational technology has obtained great importance over the last fifteen years. At present, the umbrella of educational technology incorporates multitudes of engaging online environments and fields. Learning analytics and Massive Open Online Courses (MOOCs) are two of the most relevant emerging topics in this domain. Since they are open to everyone at no cost, MOOCs excel in attracting numero...
متن کاملRecognition and Analysis of Massive Open Online Courses (MOOCs) Aesthetics for the Sustainable Education
The present study was conducted to recognize and analyze the Massive Open Online Course (MOOC) aesthetics for sustainable education. For this purpose, two methods of the exploratory search (qualitative) and the questionnaire (quantitative) were used for data collection. The research sample in the qualitative section included the electronic resources related to the topic and in the quantitative ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JASIST
دوره 68 شماره
صفحات -
تاریخ انتشار 2017